51 research outputs found

    Genomic Prediction Using Canopy Coverage Image and Genotypic Information in Soybean via a Hybrid Model

    Get PDF
    Prediction techniques are important in plant breeding as they provide a tool for selection that is more efficient and economical than traditional phenotypic and pedigree based selection. The conventional genomic prediction models include molecular marker information to predict the phenotype. With the development of new phenomics techniques we have the opportunity to collect image data on the plants, and extend the traditional genomic prediction models where we incorporate diverse set of information collected on the plants. In our research, we developed a hybrid matrix model that incorporates molecular marker and canopy coverage information as a weighted linear combination to predict grain yield for the soybean nested association mapping (SoyNAM) panel. To obtain the testing and training sets, we clustered the individuals based on their marker and canopy information using 2 different clustering techniques, and we compared 5 different cross-validation schemes. The results showed that the predictive ability of the models was the highest when both the canopy and marker information was included, and it was the lowest when only the canopy information was included

    Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    Get PDF
    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes CÏ€. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE

    Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    Get PDF
    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes CÏ€. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE

    Application of Response Surface Methods To Determine Conditions for Optimal Genomic Prediction

    Get PDF
    An epistatic genetic architecture can have a significant impact on prediction accuracies of genomic prediction (GP) methods. Machine learning methods predict traits comprised of epistatic genetic architectures more accurately than statistical methods based on additive mixed linear models. The differences between these types of GP methods suggest a diagnostic for revealing genetic architectures underlying traits of interest. In addition to genetic architecture, the performance of GP methods may be influenced by the sample size of the training population, the number of QTL, and the proportion of phenotypic variability due to genotypic variability (heritability). Possible values for these factors and the number of combinations of the factor levels that influence the performance of GP methods can be large. Thus, efficient methods for identifying combinations of factor levels that produce most accurate GPs is needed. Herein, we employ response surface methods (RSMs) to find the experimental conditions that produce the most accurate GPs. We illustrate RSM with an example of simulated doubled haploid populations and identify the combination of factors that maximize the difference between prediction accuracies of best linear unbiased prediction (BLUP) and support vector machine (SVM) GP methods. The greatest impact on the response is due to the genetic architecture of the population, heritability of the trait, and the sample size. When epistasis is responsible for all of the genotypic variance and heritability is equal to one and the sample size of the training population is large, the advantage of using the SVM method vs. the BLUP method is greatest. However, except for values close to the maximum, most of the response surface shows little difference between the methods. We also determined that the conditions resulting in the greatest prediction accuracy for BLUP occurred when genetic architecture consists solely of additive effects, and heritability is equal to one

    Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    Get PDF
    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cp. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE

    Response Surface Analysis of Genomic Prediction Accuracy Values Using Quality Control Covariates in Soybean

    Get PDF
    An important and broadly used tool for selection purposes and to increase yield and genetic gain in plant breeding programs is genomic prediction (GP). Genomic prediction is a technique where molecular marker information and phenotypic data are used to predict the phenotype (eg, yield) of individuals for which only marker data are available. Higher prediction accuracy can be achieved not only by using efficient models but also by using quality molecular marker and phenotypic data. The steps of a typical quality control (QC) of marker data include the elimination of markers with certain level of minor allele frequency (MAF) and missing marker values and the imputation of missing marker values. In this article, we evaluated how the prediction accuracy is influenced by the combination of 12 MAF values, 27 different percentages of missing marker values, and 2 imputation techniques (IT; naïve and Random Forest (RF)). We constructed a response surface of prediction accuracy values for the two ITs as a function of MAF and percentage of missing marker values using soybean data from the University of Nebraska–Lincoln Soybean Breeding Program. We found that both the genetic architecture of the trait and the IT affect the prediction accuracy implying that we have to be careful how we perform QC on the marker data. For the corresponding combinations MAF-percentage of missing values we observed that implementing the RF imputation increased the number of markers by 2 to 5 times than the simple naïve imputation method that is based on the mean allele dosage of the non-missing values at each loci. We conclude that there is not a unique strategy (combination of the QCs and imputation method) that outperforms the results of the others for all traits

    Increasing Predictive Ability by Modeling Interactions between Environments, Genotype and Canopy Coverage Image Data for Soybeans

    Get PDF
    Phenomics is a new area that offers numerous opportunities for its applicability in plant breeding. One possibility is to exploit this type of information obtained from early stages of the growing season by combining it with genomic data. This opens an avenue that can be capitalized by improving the predictive ability of the common prediction models used for genomic prediction. Imagery (canopy coverage) data recorded between days 14–71 using two collection methods (ground information in 2013 and 2014; aerial information in 2014 and 2015) on a soybean nested association mapping population (SoyNAM) was used to calibrate the prediction models together with the inclusion of several types of interactions between canopy coverage data, environments, and genomic data. Three different scenarios were considered that breeders might face testing lines in fields: (i) incomplete field trials (CV2); (ii) newly developed lines (CV1); and (iii) predicting lines in unobserved environments (CV0). Two different traits were evaluated in this study: yield and days to maturity (DTM). Results showed improvements in the predictive ability for yield with respect to those models that solely included genomic data. These relative improvements ranged 27–123%, 27–148%, and 65–165% for CV2, CV1, and CV0, respectively. No major changes were observed for DTM. Similar improvements were observed for both traits when the reduced canopy information for days 14–33 was used to build the training-testing relationships, showing a clear advantage of using phenomics in very early stages of the growing season

    Principal variable selection to explain grain yield variation in winter wheat from features extracted from UAV imagery

    Get PDF
    Background: Automated phenotyping technologies are continually advancing the breeding process. However, collecting various secondary traits throughout the growing season and processing massive amounts of data still take great efforts and time. Selecting a minimum number of secondary traits that have the maximum predictive power has the potential to reduce phenotyping efforts. The objective of this study was to select principal features extracted from UAV imagery and critical growth stages that contributed the most in explaining winter wheat grain yield. Five dates of multispectral images and seven dates of RGB images were collected by a UAV system during the spring growing season in 2018. Two classes of features (variables), totaling to 172 variables, were extracted for each plot from the vegetation index and plant height maps, including pixel statistics and dynamic growth rates. A parametric algorithm, LASSO regression (the least angle and shrinkage selection operator), and a non-parametric algorithm, random forest, were applied for variable selection. The regression coefficients estimated by LASSO and the permutation importance scores provided by random forest were used to determine the ten most important variables influencing grain yield from each algorithm. Results: Both selection algorithms assigned the highest importance score to the variables related with plant height around the grain filling stage. Some vegetation indices related variables were also selected by the algorithms mainly at earlier to mid growth stages and during the senescence. Compared with the yield prediction using all 172 variables derived from measured phenotypes, using the selected variables performed comparable or even better. We also noticed that the prediction accuracy on the adapted NE lines (r = 0.58–0.81) was higher than the other lines (r = 0.21–0.59) included in this study with different genetic backgrounds. Conclusions: With the ultra-high resolution plot imagery obtained by the UAS-based phenotyping we are now able to derive more features, such as the variation of plant height or vegetation indices within a plot other than just an averaged number, that are potentially very useful for the breeding purpose. However, too many features or variables can be derived in this way. The promising results from this study suggests that the selected set from those variables can have comparable prediction accuracies on the grain yield prediction than the full set of them but possibly resulting in a better allocation of efforts and resources on phenotypic data collection and processing

    Enhancing Hybrid Prediction in Pearl Millet Using Genomic and/or Multi- Environment Phenotypic Information of Inbreds

    Get PDF
    Genomic selection (GS) is an emerging methodology that helps select superior lines among experimental cultivars in plant breeding programs. It offers the opportunity to increase the productivity of cultivars by delivering increased genetic gains and reducing the breeding cycles. This methodology requires inexpensive and sufficiently dense marker information to be successful, and with whole genome sequencing, it has become an important tool in many crops. The recent assembly of the pearl millet genome has made it possible to employ GS models to improve the selection procedure in pearl millet breeding programs. Here, three GS models were implemented and compared using grain yield and dense molecular marker information of pearl millet obtained from two different genotyping platforms (C [conventional GBS RAD-seq] and T [tunable GBS tGBS]). The models were evaluated using three different cross-validation (CV) schemes mimicking real situations that breeders face in breeding programs: CV2 resembles an incomplete field trial, CV1 predicts the performance of untested hybrids, and CV0 predicts the performance of hybrids in unobserved environments. We found that (i) adding phenotypic information of parental inbreds to the calibration sets improved predictive ability, (ii) accounting for genotype-by-environment interaction also increased the performance of the models, and (iii) superior strategies should consider the use of the molecular markers derived from the T platform (tGBS)

    Evaluating dimensionality reduction for genomic prediction

    Get PDF
    The development of genomic selection (GS) methods has allowed plant breeding programs to select favorable lines using genomic data before performing field trials. Improvements in genotyping technology have yielded high-dimensional genomic marker data which can be difficult to incorporate into statistical models. In this paper, we investigated the utility of applying dimensionality reduction (DR) methods as a pre-processing step for GS methods. We compared five DR methods and studied the trend in the prediction accuracies of each method as a function of the number of features retained. The effect of DR methods was studied using three models that involved the main effects of line, environment, marker, and the genotype by environment interactions. The methods were applied on a real data set containing 315 lines phenotyped in nine environments with 26,817 markers each. Regardless of the DR method and prediction model used, only a fraction of features was sufficient to achieve maximum correlation. Our results underline the usefulness of DR methods as a key pre-processing step in GS models to improve computational efficiency in the face of ever-increasing size of genomic data
    • …
    corecore